FlashQueryFile: Flash-Optimized Layout and Algorithms for Interactive Ad Hoc SQL on Big Data
نویسنده
چکیده
High performance storage layer is vital for allowing interactive ad hoc SQL analytics (OLAP style) over Big Data. The paper makes a case for leveraging flash in the Big Data stack to speed up queries. State-ofthe-art Big Data layouts and algorithms are optimized for hard disks (i.e., sequential access is emphasized over random access) and result in suboptimal performance on flash given its drastically different performance characteristics. While existing columnar and row-columnar layouts are able to reduce disk IO compared to row-based layouts, they still end up reading significant columnar data irrelevant to the query as they only employ coarse-grained, intra-columnar data skipping which doesn’t work across all queries. FlashQueryFile’s specialized columnar data layouts, selection, and projection algorithms fully exploit fast random accesses and high internal I/O parallelism of flash to allow fast and I/O-efficient query processing and fine-grained, intra-columnar data skipping to minimize data read per query. FlashQueryFile results in 11X-100X TPC-H query speedup and 38%-99.08% reduction in data read compared to flash-based HDD-optimized row-columnar data layout and its associated algorithms.
منابع مشابه
Apache Drill: Interactive Ad-Hoc Analysis at Scale.
Apache Drill is a distributed system for interactive ad-hoc analysis of large-scale datasets. Designed to handle up to petabytes of data spread across thousands of servers, the goal of Drill is to respond to ad-hoc queries in a low-latency manner. In this article, we introduce Drill's architecture, discuss its extensibility points, and put it into the context of the emerging offerings in the in...
متن کاملA graphical tool for ad hoc query generation
Medical data are characterized by complex taxonomies and evolving terminology. Questions that clinicians, medical administrators, and researchers may wish to answer using medical databases are not easily formulated as SQL queries. In this paper we describe a graphical tool that facilitates formulation of ad hoc questions as SQL queries. This tool manages multiple attribute hierarchies and creat...
متن کاملBig Scale Text Analytics and Smart Content Navigation
Identifying and exploring relevant content in growing document collections is a challenge for researchers, users, and system providers alike. Supporting this is crucial for companies offering knowledge in the form of documents as their core product. Our demo shows an intelligent way of doing guided research in big text collections, using the collection of the major scientific publisher Springer...
متن کاملA New Routing Algorithm for Vehicular Ad-hoc Networks based on Glowworm Swarm Optimization Algorithm
Vehicular ad hoc networks (VANETs) are a particular type of Mobile ad hoc networks (MANET) in which the vehicles are considered as nodes. Due to rapid topology changing and frequent disconnection makes it difficult to design an efficient routing protocol for routing data among vehicles. In this paper, a new routing protocol based on glowworm swarm optimization algorithm is provided. Using the g...
متن کاملAssessment of DSACC and QPART Algorithms in Ad Hoc Networks
The rapid advancement in wireless over wired has augmented the need for improving theQuality of Service (QoS) over such wireless links. However, the wireless ad hoc networkshave too low bandwidth, and establishing a QoS in these networks is a difficult issue. So,support of quality of service in ad hoc networks is the topical issue among the networkscience researchers. In this research we are go...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014